Overview

Dataset statistics

Number of variables22
Number of observations73,908
Missing cells288,306
Missing cells (%)17.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory12.4 MiB
Average record size in memory176.0 B

Variable types

Numeric11
Categorical7
DateTime2
Boolean1
Unsupported1

Alerts

df_index is highly correlated with fare_amount and 2 other fieldsHigh correlation
RatecodeID is highly correlated with mta_tax and 1 other fieldsHigh correlation
trip_distance is highly correlated with fare_amount and 2 other fieldsHigh correlation
fare_amount is highly correlated with df_index and 4 other fieldsHigh correlation
mta_tax is highly correlated with df_index and 4 other fieldsHigh correlation
tip_amount is highly correlated with payment_typeHigh correlation
total_amount is highly correlated with df_index and 4 other fieldsHigh correlation
payment_type is highly correlated with tip_amountHigh correlation
trip_type is highly correlated with RatecodeID and 1 other fieldsHigh correlation
duration is highly correlated with trip_distance and 2 other fieldsHigh correlation
df_index is highly correlated with mta_taxHigh correlation
RatecodeID is highly correlated with mta_tax and 1 other fieldsHigh correlation
fare_amount is highly correlated with total_amount and 1 other fieldsHigh correlation
mta_tax is highly correlated with df_index and 2 other fieldsHigh correlation
tip_amount is highly correlated with payment_typeHigh correlation
tolls_amount is highly correlated with total_amountHigh correlation
total_amount is highly correlated with fare_amount and 2 other fieldsHigh correlation
payment_type is highly correlated with tip_amountHigh correlation
trip_type is highly correlated with RatecodeID and 1 other fieldsHigh correlation
duration is highly correlated with fare_amount and 1 other fieldsHigh correlation
df_index is highly correlated with mta_taxHigh correlation
RatecodeID is highly correlated with mta_tax and 1 other fieldsHigh correlation
trip_distance is highly correlated with fare_amount and 2 other fieldsHigh correlation
fare_amount is highly correlated with trip_distance and 2 other fieldsHigh correlation
mta_tax is highly correlated with df_index and 2 other fieldsHigh correlation
tip_amount is highly correlated with payment_typeHigh correlation
total_amount is highly correlated with trip_distance and 2 other fieldsHigh correlation
payment_type is highly correlated with tip_amountHigh correlation
trip_type is highly correlated with RatecodeID and 1 other fieldsHigh correlation
duration is highly correlated with trip_distance and 2 other fieldsHigh correlation
RatecodeID is highly correlated with mta_tax and 1 other fieldsHigh correlation
mta_tax is highly correlated with RatecodeID and 2 other fieldsHigh correlation
improvement_surcharge is highly correlated with mta_taxHigh correlation
trip_type is highly correlated with RatecodeID and 1 other fieldsHigh correlation
df_index is highly correlated with extra and 2 other fieldsHigh correlation
RatecodeID is highly correlated with mta_tax and 1 other fieldsHigh correlation
fare_amount is highly correlated with mta_tax and 2 other fieldsHigh correlation
extra is highly correlated with df_index and 2 other fieldsHigh correlation
mta_tax is highly correlated with df_index and 6 other fieldsHigh correlation
tip_amount is highly correlated with df_indexHigh correlation
improvement_surcharge is highly correlated with fare_amount and 3 other fieldsHigh correlation
total_amount is highly correlated with fare_amount and 2 other fieldsHigh correlation
trip_type is highly correlated with RatecodeID and 1 other fieldsHigh correlation
store_and_fwd_flag has 35733 (48.3%) missing values Missing
RatecodeID has 35733 (48.3%) missing values Missing
passenger_count has 35733 (48.3%) missing values Missing
ehail_fee has 73908 (100.0%) missing values Missing
payment_type has 35733 (48.3%) missing values Missing
trip_type has 35733 (48.3%) missing values Missing
congestion_surcharge has 35733 (48.3%) missing values Missing
trip_distance is highly skewed (γ1 = 72.23375746) Skewed
df_index has unique values Unique
ehail_fee is an unsupported type, check if it needs cleaning or further analysis Unsupported
trip_distance has 1626 (2.2%) zeros Zeros
extra has 45658 (61.8%) zeros Zeros
tip_amount has 33957 (45.9%) zeros Zeros
tolls_amount has 67701 (91.6%) zeros Zeros

Reproduction

Analysis started2022-07-28 08:53:58.753562
Analysis finished2022-07-28 08:54:15.622245
Duration16.87 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct73908
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38742.87852
Minimum0
Maximum76517
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size577.5 KiB
2022-07-28T10:54:15.676421image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3933.35
Q119621.75
median39181.5
Q357849.25
95-th percentile72778.65
Maximum76517
Range76517
Interquartile range (IQR)38227.5

Descriptive statistics

Standard deviation22076.00452
Coefficient of variation (CV)0.5698080619
Kurtosis-1.198774571
Mean38742.87852
Median Absolute Deviation (MAD)19109
Skewness-0.03185112273
Sum2863408666
Variance487349975.8
MonotonicityStrictly increasing
2022-07-28T10:54:15.777189image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
517301
 
< 0.1%
516381
 
< 0.1%
516371
 
< 0.1%
516361
 
< 0.1%
516351
 
< 0.1%
516341
 
< 0.1%
516331
 
< 0.1%
516311
 
< 0.1%
516301
 
< 0.1%
Other values (73898)73898
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
71
< 0.1%
91
< 0.1%
101
< 0.1%
111
< 0.1%
121
< 0.1%
131
< 0.1%
ValueCountFrequency (%)
765171
< 0.1%
765161
< 0.1%
765151
< 0.1%
765141
< 0.1%
765131
< 0.1%
765121
< 0.1%
765111
< 0.1%
765101
< 0.1%
765091
< 0.1%
765081
< 0.1%

VendorID
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size577.5 KiB
2
66922 
1
6986 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters73,908
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
266922
90.5%
16986
 
9.5%

Length

2022-07-28T10:54:15.871744image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T10:54:15.951119image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
266922
90.5%
16986
 
9.5%

Most occurring characters

ValueCountFrequency (%)
266922
90.5%
16986
 
9.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number73908
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
266922
90.5%
16986
 
9.5%

Most occurring scripts

ValueCountFrequency (%)
Common73908
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
266922
90.5%
16986
 
9.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII73908
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
266922
90.5%
16986
 
9.5%
Distinct56331
Distinct (%)76.2%
Missing0
Missing (%)0.0%
Memory size577.5 KiB
Minimum2009-01-01 00:03:25
Maximum2021-01-31 23:46:45
2022-07-28T10:54:16.029093image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:16.126362image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct56402
Distinct (%)76.3%
Missing0
Missing (%)0.0%
Memory size577.5 KiB
Minimum2009-01-01 00:12:25
Maximum2021-01-31 23:57:08
2022-07-28T10:54:16.227274image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:16.324867image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

store_and_fwd_flag
Boolean

MISSING

Distinct2
Distinct (%)< 0.1%
Missing35733
Missing (%)48.3%
Memory size144.5 KiB
False
37944 
True
 
231
(Missing)
35733 
ValueCountFrequency (%)
False37944
51.3%
True231
 
0.3%
(Missing)35733
48.3%
2022-07-28T10:54:16.577493image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

RatecodeID
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing35733
Missing (%)48.3%
Memory size577.5 KiB
1.0
37358 
5.0
 
756
2.0
 
29
4.0
 
28
3.0
 
4

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters114,525
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.037358
50.5%
5.0756
 
1.0%
2.029
 
< 0.1%
4.028
 
< 0.1%
3.04
 
< 0.1%
(Missing)35733
48.3%

Length

2022-07-28T10:54:16.641849image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T10:54:16.722221image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
1.037358
97.9%
5.0756
 
2.0%
2.029
 
0.1%
4.028
 
0.1%
3.04
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
.38175
33.3%
038175
33.3%
137358
32.6%
5756
 
0.7%
229
 
< 0.1%
428
 
< 0.1%
34
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number76350
66.7%
Other Punctuation38175
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
038175
50.0%
137358
48.9%
5756
 
1.0%
229
 
< 0.1%
428
 
< 0.1%
34
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
.38175
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common114525
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.38175
33.3%
038175
33.3%
137358
32.6%
5756
 
0.7%
229
 
< 0.1%
428
 
< 0.1%
34
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII114525
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.38175
33.3%
038175
33.3%
137358
32.6%
5756
 
0.7%
229
 
< 0.1%
428
 
< 0.1%
34
 
< 0.1%

PULocationID
Real number (ℝ≥0)

Distinct250
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean109.0945094
Minimum3
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size577.5 KiB
2022-07-28T10:54:16.803097image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile22
Q152
median76
Q3166
95-th percentile244
Maximum265
Range262
Interquartile range (IQR)114

Descriptive statistics

Standard deviation70.85340385
Coefficient of variation (CV)0.649468101
Kurtosis-0.7941199081
Mean109.0945094
Median Absolute Deviation (MAD)35
Skewness0.6672850804
Sum8062957
Variance5020.204837
MonotonicityNot monotonic
2022-07-28T10:54:16.902745image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
746555
 
8.9%
756063
 
8.2%
414071
 
5.5%
422683
 
3.6%
2442554
 
3.5%
952006
 
2.7%
971927
 
2.6%
1661889
 
2.6%
651325
 
1.8%
431320
 
1.8%
Other values (240)43515
58.9%
ValueCountFrequency (%)
3194
 
0.3%
427
 
< 0.1%
71171
1.6%
81
 
< 0.1%
982
 
0.1%
10273
 
0.4%
1196
 
0.1%
121
 
< 0.1%
139
 
< 0.1%
14403
 
0.5%
ValueCountFrequency (%)
265119
0.2%
26432
 
< 0.1%
263127
0.2%
26225
 
< 0.1%
2619
 
< 0.1%
260216
0.3%
259149
0.2%
258116
0.2%
25785
 
0.1%
25666
 
0.1%

DOLocationID
Real number (ℝ≥0)

Distinct256
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean130.4430508
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size577.5 KiB
2022-07-28T10:54:17.000466image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile24
Q165
median129
Q3197
95-th percentile248
Maximum265
Range264
Interquartile range (IQR)132

Descriptive statistics

Standard deviation76.94977646
Coefficient of variation (CV)0.589910892
Kurtosis-1.310876723
Mean130.4430508
Median Absolute Deviation (MAD)67
Skewness0.1850186625
Sum9640785
Variance5921.268098
MonotonicityNot monotonic
2022-07-28T10:54:17.098911image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
743042
 
4.1%
752716
 
3.7%
422598
 
3.5%
412164
 
2.9%
2361417
 
1.9%
611351
 
1.8%
2381329
 
1.8%
1661283
 
1.7%
2441104
 
1.5%
2631094
 
1.5%
Other values (246)55810
75.5%
ValueCountFrequency (%)
17
 
< 0.1%
27
 
< 0.1%
3151
 
0.2%
491
 
0.1%
61
 
< 0.1%
7627
0.8%
82
 
< 0.1%
987
 
0.1%
10353
0.5%
1187
 
0.1%
ValueCountFrequency (%)
265259
 
0.4%
26465
 
0.1%
2631094
1.5%
262464
0.6%
26172
 
0.1%
260303
 
0.4%
259163
 
0.2%
258151
 
0.2%
257128
 
0.2%
256124
 
0.2%

passenger_count
Real number (ℝ≥0)

MISSING

Distinct7
Distinct (%)< 0.1%
Missing35733
Missing (%)48.3%
Infinite0
Infinite (%)0.0%
Mean1.196123117
Minimum0
Maximum6
Zeros110
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size577.5 KiB
2022-07-28T10:54:17.179450image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.7626378212
Coefficient of variation (CV)0.6375914069
Kurtosis22.01311368
Mean1.196123117
Median Absolute Deviation (MAD)0
Skewness4.609093394
Sum45662
Variance0.5816164463
MonotonicityNot monotonic
2022-07-28T10:54:17.243090image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
134379
46.5%
22227
 
3.0%
5683
 
0.9%
3314
 
0.4%
6312
 
0.4%
4150
 
0.2%
0110
 
0.1%
(Missing)35733
48.3%
ValueCountFrequency (%)
0110
 
0.1%
134379
46.5%
22227
 
3.0%
3314
 
0.4%
4150
 
0.2%
5683
 
0.9%
6312
 
0.4%
ValueCountFrequency (%)
6312
 
0.4%
5683
 
0.9%
4150
 
0.2%
3314
 
0.4%
22227
 
3.0%
134379
46.5%
0110
 
0.1%

trip_distance
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct2794
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42.0477196
Minimum0
Maximum244152.01
Zeros1626
Zeros (%)2.2%
Negative0
Negative (%)0.0%
Memory size577.5 KiB
2022-07-28T10:54:17.328472image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.5
Q11.34
median2.6
Q35.68
95-th percentile15.36
Maximum244152.01
Range244152.01
Interquartile range (IQR)4.34

Descriptive statistics

Standard deviation1958.08235
Coefficient of variation (CV)46.56809856
Kurtosis6401.155255
Mean42.0477196
Median Absolute Deviation (MAD)1.59
Skewness72.23375746
Sum3107662.86
Variance3834086.49
MonotonicityNot monotonic
2022-07-28T10:54:17.428956image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01626
 
2.2%
1.3442
 
0.6%
1.4430
 
0.6%
1.2417
 
0.6%
1.1406
 
0.5%
1395
 
0.5%
0.9375
 
0.5%
1.5365
 
0.5%
1.7353
 
0.5%
0.8337
 
0.5%
Other values (2784)68762
93.0%
ValueCountFrequency (%)
01626
2.2%
0.0127
 
< 0.1%
0.0211
 
< 0.1%
0.0315
 
< 0.1%
0.0415
 
< 0.1%
0.0513
 
< 0.1%
0.0616
 
< 0.1%
0.078
 
< 0.1%
0.0810
 
< 0.1%
0.098
 
< 0.1%
ValueCountFrequency (%)
244152.011
< 0.1%
182840.321
< 0.1%
150672.011
< 0.1%
144972.471
< 0.1%
144948.191
< 0.1%
129402.51
< 0.1%
114653.631
< 0.1%
105286.831
< 0.1%
76058.911
< 0.1%
69045.731
< 0.1%

fare_amount
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3277
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.51612789
Minimum-280
Maximum280
Zeros76
Zeros (%)0.1%
Negative72
Negative (%)0.1%
Memory size577.5 KiB
2022-07-28T10:54:17.528707image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum-280
5-th percentile5
Q19
median16.83
Q325.21
95-th percentile46.88
Maximum280
Range560
Interquartile range (IQR)16.21

Descriptive statistics

Standard deviation13.45626445
Coefficient of variation (CV)0.6894945826
Kurtosis8.635783862
Mean19.51612789
Median Absolute Deviation (MAD)7.83
Skewness1.409656861
Sum1442397.98
Variance181.071053
MonotonicityNot monotonic
2022-07-28T10:54:17.622377image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
72204
 
3.0%
6.51930
 
2.6%
61883
 
2.5%
81753
 
2.4%
7.51705
 
2.3%
5.51679
 
2.3%
51513
 
2.0%
8.51511
 
2.0%
91431
 
1.9%
101244
 
1.7%
Other values (3267)57055
77.2%
ValueCountFrequency (%)
-2801
 
< 0.1%
-1201
 
< 0.1%
-521
 
< 0.1%
-33.871
 
< 0.1%
-281
 
< 0.1%
-253
< 0.1%
-21.161
 
< 0.1%
-15.981
 
< 0.1%
-151
 
< 0.1%
-13.441
 
< 0.1%
ValueCountFrequency (%)
2801
 
< 0.1%
1711
 
< 0.1%
1661
 
< 0.1%
1501
 
< 0.1%
1251
 
< 0.1%
120.51
 
< 0.1%
1203
< 0.1%
1111
 
< 0.1%
1101
 
< 0.1%
106.911
 
< 0.1%

extra
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7556010175
Minimum-5.5
Maximum8.25
Zeros45658
Zeros (%)61.8%
Negative29
Negative (%)< 0.1%
Memory size577.5 KiB
2022-07-28T10:54:17.707094image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum-5.5
5-th percentile0
Q10
median0
Q31
95-th percentile2.75
Maximum8.25
Range13.75
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.19422487
Coefficient of variation (CV)1.58049664
Kurtosis1.290200132
Mean0.7556010175
Median Absolute Deviation (MAD)0
Skewness1.462910674
Sum55844.96
Variance1.426173039
MonotonicityNot monotonic
2022-07-28T10:54:17.776204image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
045658
61.8%
2.7514409
 
19.5%
17156
 
9.7%
0.55027
 
6.8%
5.5687
 
0.9%
3.75393
 
0.5%
3.25261
 
0.4%
1.35258
 
0.3%
4.0924
 
< 0.1%
-0.516
 
< 0.1%
Other values (5)19
 
< 0.1%
ValueCountFrequency (%)
-5.51
 
< 0.1%
-2.753
 
< 0.1%
-19
 
< 0.1%
-0.516
 
< 0.1%
045658
61.8%
0.55027
 
6.8%
17156
 
9.7%
1.35258
 
0.3%
2.7514409
 
19.5%
3.25261
 
0.4%
ValueCountFrequency (%)
8.252
 
< 0.1%
5.5687
 
0.9%
4.54
 
< 0.1%
4.0924
 
< 0.1%
3.75393
 
0.5%
3.25261
 
0.4%
2.7514409
19.5%
1.35258
 
0.3%
17156
9.7%
0.55027
 
6.8%

mta_tax
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size577.5 KiB
0.5
37463 
0.0
36386 
-0.5
 
59

Length

Max length4
Median length3
Mean length3.00079829
Min length3

Characters and Unicode

Total characters221,783
Distinct characters4
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.5
2nd row0.5
3rd row0.5
4th row0.5
5th row0.5

Common Values

ValueCountFrequency (%)
0.537463
50.7%
0.036386
49.2%
-0.559
 
0.1%

Length

2022-07-28T10:54:17.857775image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T10:54:17.938948image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
0.537522
50.8%
0.036386
49.2%

Most occurring characters

ValueCountFrequency (%)
0110294
49.7%
.73908
33.3%
537522
 
16.9%
-59
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number147816
66.6%
Other Punctuation73908
33.3%
Dash Punctuation59
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0110294
74.6%
537522
 
25.4%
Other Punctuation
ValueCountFrequency (%)
.73908
100.0%
Dash Punctuation
ValueCountFrequency (%)
-59
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common221783
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0110294
49.7%
.73908
33.3%
537522
 
16.9%
-59
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII221783
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0110294
49.7%
.73908
33.3%
537522
 
16.9%
-59
 
< 0.1%

tip_amount
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct967
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.509278292
Minimum-9.45
Maximum110
Zeros33957
Zeros (%)45.9%
Negative4
Negative (%)< 0.1%
Memory size577.5 KiB
2022-07-28T10:54:18.019107image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum-9.45
5-th percentile0
Q10
median1.26
Q32.75
95-th percentile4
Maximum110
Range119.45
Interquartile range (IQR)2.75

Descriptive statistics

Standard deviation1.774128974
Coefficient of variation (CV)1.175481675
Kurtosis203.2600878
Mean1.509278292
Median Absolute Deviation (MAD)1.26
Skewness4.937682598
Sum111547.74
Variance3.147533618
MonotonicityNot monotonic
2022-07-28T10:54:18.117712image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
033957
45.9%
2.7520547
27.8%
11307
 
1.8%
21213
 
1.6%
3526
 
0.7%
1.66322
 
0.4%
1.56311
 
0.4%
1.46296
 
0.4%
1.36274
 
0.4%
1.96253
 
0.3%
Other values (957)14902
20.2%
ValueCountFrequency (%)
-9.451
 
< 0.1%
-1.143
 
< 0.1%
033957
45.9%
0.0164
 
0.1%
0.028
 
< 0.1%
0.038
 
< 0.1%
0.044
 
< 0.1%
0.057
 
< 0.1%
0.062
 
< 0.1%
0.073
 
< 0.1%
ValueCountFrequency (%)
1101
< 0.1%
421
< 0.1%
381
< 0.1%
31.21
< 0.1%
301
< 0.1%
251
< 0.1%
24.091
< 0.1%
24.061
< 0.1%
241
< 0.1%
20.551
< 0.1%

tolls_amount
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct26
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5169471505
Minimum0
Maximum31.25
Zeros67701
Zeros (%)91.6%
Negative0
Negative (%)0.0%
Memory size577.5 KiB
2022-07-28T10:54:18.208966image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile6.12
Maximum31.25
Range31.25
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.801222874
Coefficient of variation (CV)3.484346266
Kurtosis20.60496277
Mean0.5169471505
Median Absolute Deviation (MAD)0
Skewness3.9283023
Sum38206.53
Variance3.244403841
MonotonicityNot monotonic
2022-07-28T10:54:18.287814image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
067701
91.6%
6.125345
 
7.2%
2.29273
 
0.4%
2.8241
 
0.3%
12.24207
 
0.3%
8.4151
 
0.1%
11.7520
 
< 0.1%
4.5812
 
< 0.1%
27.59
 
< 0.1%
17.878
 
< 0.1%
Other values (16)41
 
0.1%
ValueCountFrequency (%)
067701
91.6%
22
 
< 0.1%
2.29273
 
0.4%
2.8241
 
0.3%
4.5812
 
< 0.1%
4.751
 
< 0.1%
5.61
 
< 0.1%
6.125345
 
7.2%
83
 
< 0.1%
8.4151
 
0.1%
ValueCountFrequency (%)
31.251
 
< 0.1%
27.59
< 0.1%
23.52
 
< 0.1%
19.877
< 0.1%
18.362
 
< 0.1%
17.878
< 0.1%
16.821
 
< 0.1%
16.331
 
< 0.1%
16.121
 
< 0.1%
15.041
 
< 0.1%

ehail_fee
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing73908
Missing (%)100.0%
Memory size577.5 KiB

improvement_surcharge
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size577.5 KiB
0.3
73723 
0.0
 
119
-0.3
 
66

Length

Max length4
Median length3
Mean length3.000893002
Min length3

Characters and Unicode

Total characters221,790
Distinct characters4
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.3
2nd row0.3
3rd row0.3
4th row0.3
5th row0.3

Common Values

ValueCountFrequency (%)
0.373723
99.7%
0.0119
 
0.2%
-0.366
 
0.1%

Length

2022-07-28T10:54:18.372931image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T10:54:18.452469image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
0.373789
99.8%
0.0119
 
0.2%

Most occurring characters

ValueCountFrequency (%)
074027
33.4%
.73908
33.3%
373789
33.3%
-66
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number147816
66.6%
Other Punctuation73908
33.3%
Dash Punctuation66
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
074027
50.1%
373789
49.9%
Other Punctuation
ValueCountFrequency (%)
.73908
100.0%
Dash Punctuation
ValueCountFrequency (%)
-66
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common221790
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
074027
33.4%
.73908
33.3%
373789
33.3%
-66
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII221790
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
074027
33.4%
.73908
33.3%
373789
33.3%
-66
 
< 0.1%

total_amount
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3746
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.14664597
Minimum-280.3
Maximum280.3
Zeros70
Zeros (%)0.1%
Negative72
Negative (%)0.1%
Memory size577.5 KiB
2022-07-28T10:54:18.530226image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum-280.3
5-th percentile6.8
Q112
median20.64
Q329
95-th percentile53.47
Maximum280.3
Range560.6
Interquartile range (IQR)17

Descriptive statistics

Standard deviation14.80600905
Coefficient of variation (CV)0.6396611011
Kurtosis6.822531653
Mean23.14664597
Median Absolute Deviation (MAD)8.58
Skewness1.376921049
Sum1710722.31
Variance219.2179039
MonotonicityNot monotonic
2022-07-28T10:54:18.623370image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.81065
 
1.4%
19.781061
 
1.4%
7.31023
 
1.4%
8.31009
 
1.4%
6.8997
 
1.3%
8.8944
 
1.3%
6.3852
 
1.2%
18.5842
 
1.1%
9.3795
 
1.1%
9.8790
 
1.1%
Other values (3736)64530
87.3%
ValueCountFrequency (%)
-280.31
 
< 0.1%
-120.31
 
< 0.1%
-52.81
 
< 0.1%
-42.521
 
< 0.1%
-28.31
 
< 0.1%
-26.361
 
< 0.1%
-25.33
< 0.1%
-15.891
 
< 0.1%
-15.31
 
< 0.1%
-14.271
 
< 0.1%
ValueCountFrequency (%)
280.31
< 0.1%
1751
< 0.1%
166.31
< 0.1%
154.171
< 0.1%
151.31
< 0.1%
144.361
< 0.1%
128.171
< 0.1%
124.041
< 0.1%
123.251
< 0.1%
121.81
< 0.1%

payment_type
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing35733
Missing (%)48.3%
Memory size577.5 KiB
1.0
23041 
2.0
14926 
3.0
 
157
4.0
 
50
5.0
 
1

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters114,525
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row2.0
2nd row1.0
3rd row1.0
4th row2.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.023041
31.2%
2.014926
20.2%
3.0157
 
0.2%
4.050
 
0.1%
5.01
 
< 0.1%
(Missing)35733
48.3%

Length

2022-07-28T10:54:18.712727image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T10:54:18.796156image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
1.023041
60.4%
2.014926
39.1%
3.0157
 
0.4%
4.050
 
0.1%
5.01
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
.38175
33.3%
038175
33.3%
123041
20.1%
214926
 
13.0%
3157
 
0.1%
450
 
< 0.1%
51
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number76350
66.7%
Other Punctuation38175
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
038175
50.0%
123041
30.2%
214926
 
19.5%
3157
 
0.2%
450
 
0.1%
51
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
.38175
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common114525
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.38175
33.3%
038175
33.3%
123041
20.1%
214926
 
13.0%
3157
 
0.1%
450
 
< 0.1%
51
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII114525
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.38175
33.3%
038175
33.3%
123041
20.1%
214926
 
13.0%
3157
 
0.1%
450
 
< 0.1%
51
 
< 0.1%

trip_type
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing35733
Missing (%)48.3%
Memory size577.5 KiB
1.0
37535 
2.0
 
640

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters114,525
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.037535
50.8%
2.0640
 
0.9%
(Missing)35733
48.3%

Length

2022-07-28T10:54:18.871092image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T10:54:18.948222image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
1.037535
98.3%
2.0640
 
1.7%

Most occurring characters

ValueCountFrequency (%)
.38175
33.3%
038175
33.3%
137535
32.8%
2640
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number76350
66.7%
Other Punctuation38175
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
038175
50.0%
137535
49.2%
2640
 
0.8%
Other Punctuation
ValueCountFrequency (%)
.38175
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common114525
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.38175
33.3%
038175
33.3%
137535
32.8%
2640
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII114525
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.38175
33.3%
038175
33.3%
137535
32.8%
2640
 
0.6%

congestion_surcharge
Categorical

MISSING

Distinct3
Distinct (%)< 0.1%
Missing35733
Missing (%)48.3%
Memory size577.5 KiB
0.0
29228 
2.75
8942 
2.5
 
5

Length

Max length4
Median length3
Mean length3.234237066
Min length3

Characters and Unicode

Total characters123,467
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row2.75
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.029228
39.5%
2.758942
 
12.1%
2.55
 
< 0.1%
(Missing)35733
48.3%

Length

2022-07-28T10:54:19.013790image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T10:54:19.093869image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
0.029228
76.6%
2.758942
 
23.4%
2.55
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
058456
47.3%
.38175
30.9%
28947
 
7.2%
58947
 
7.2%
78942
 
7.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number85292
69.1%
Other Punctuation38175
30.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
058456
68.5%
28947
 
10.5%
58947
 
10.5%
78942
 
10.5%
Other Punctuation
ValueCountFrequency (%)
.38175
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common123467
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
058456
47.3%
.38175
30.9%
28947
 
7.2%
58947
 
7.2%
78942
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII123467
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
058456
47.3%
.38175
30.9%
28947
 
7.2%
58947
 
7.2%
78942
 
7.2%

duration
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3120
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16.85257843
Minimum1
Maximum60
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size577.5 KiB
2022-07-28T10:54:19.175290image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3.816666667
Q18.05
median14
Q322.63333333
95-th percentile41
Maximum60
Range59
Interquartile range (IQR)14.58333333

Descriptive statistics

Standard deviation11.56316304
Coefficient of variation (CV)0.6861361357
Kurtosis1.078105442
Mean16.85257843
Median Absolute Deviation (MAD)6.716666667
Skewness1.187936546
Sum1245540.367
Variance133.7067395
MonotonicityNot monotonic
2022-07-28T10:54:19.271335image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
131427
 
1.9%
101425
 
1.9%
151424
 
1.9%
111421
 
1.9%
121405
 
1.9%
141374
 
1.9%
91327
 
1.8%
161277
 
1.7%
81233
 
1.7%
171207
 
1.6%
Other values (3110)60388
81.7%
ValueCountFrequency (%)
175
0.1%
1.01666666715
 
< 0.1%
1.03333333310
 
< 0.1%
1.0510
 
< 0.1%
1.0666666679
 
< 0.1%
1.08333333311
 
< 0.1%
1.113
 
< 0.1%
1.1166666676
 
< 0.1%
1.13333333314
 
< 0.1%
1.158
 
< 0.1%
ValueCountFrequency (%)
6036
< 0.1%
59.983333332
 
< 0.1%
59.952
 
< 0.1%
59.933333331
 
< 0.1%
59.91
 
< 0.1%
59.866666671
 
< 0.1%
59.851
 
< 0.1%
59.833333331
 
< 0.1%
59.816666671
 
< 0.1%
59.783333331
 
< 0.1%

Interactions

2022-07-28T10:54:13.738260image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:03.533169image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:04.553559image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:05.646757image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:06.613181image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:07.559790image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:08.531216image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:09.674086image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:10.655019image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:11.612952image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:12.751463image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:13.832095image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:03.631163image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:04.778844image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:05.738956image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:06.703492image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:07.653151image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:08.786585image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:09.768129image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:10.746891image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:11.707176image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:12.844278image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:13.919685image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:03.722961image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:04.861630image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:05.824854image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:06.787790image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:07.739156image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:08.871803image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:09.855379image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:10.831625image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:11.793563image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:12.931976image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:14.003494image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:03.811419image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:04.942779image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:05.907051image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:06.872712image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:07.821991image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:08.956456image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:09.939970image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:10.914066image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:11.876791image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:13.018045image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:14.094509image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:03.904766image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:05.033067image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:05.996405image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:06.958885image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:07.910468image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:09.047761image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:10.027211image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:11.001713image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:11.965604image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:13.109569image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:14.182380image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:03.996108image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:05.117493image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:06.084064image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:07.043034image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:07.997252image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:09.135434image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:10.116353image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:11.088530image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:12.053493image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:13.197884image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:14.272288image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:04.089793image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:05.205605image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:06.173923image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:07.129198image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:08.087080image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:09.225778image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:10.206973image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:11.176528image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:12.142734image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:13.289008image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:14.363643image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:04.183690image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:05.295405image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:06.264803image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:07.212842image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:08.177128image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:09.317915image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:10.297643image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:11.265003image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:12.234345image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:13.380531image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:14.451413image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:04.274116image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:05.381467image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:06.351833image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:07.296347image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:08.262336image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:09.406021image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:10.385165image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:11.350548image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:12.322727image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:13.468661image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:14.541652image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:04.367189image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:05.469854image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:06.440475image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:07.382358image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:08.351764image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:09.495844image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:10.475688image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:11.438424image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:12.413711image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:13.557532image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:14.631683image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:04.460312image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:05.557879image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:06.530488image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:07.470662image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:08.441322image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:09.585833image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:10.565882image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:11.526338image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:12.504231image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-28T10:54:13.647697image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Correlations

2022-07-28T10:54:19.522815image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-28T10:54:19.673917image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-07-28T10:54:19.824716image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-28T10:54:19.963465image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-07-28T10:54:20.074753image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-07-28T10:54:14.787367image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-28T10:54:15.121155image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-07-28T10:54:15.380622image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-07-28T10:54:15.503895image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexVendorIDlpep_pickup_datetimelpep_dropoff_datetimestore_and_fwd_flagRatecodeIDPULocationIDDOLocationIDpassenger_counttrip_distancefare_amountextramta_taxtip_amounttolls_amountehail_feeimprovement_surchargetotal_amountpayment_typetrip_typecongestion_surchargeduration
0022021-01-01 00:15:562021-01-01 00:19:52N1.0431511.01.015.50.50.50.000.0None0.36.802.01.00.003.933333
1122021-01-01 00:25:592021-01-01 00:34:44N1.01662391.02.5310.00.50.52.810.0None0.316.861.01.02.758.750000
2222021-01-01 00:45:572021-01-01 00:51:55N1.041421.01.126.00.50.51.000.0None0.38.301.01.00.005.966667
3322020-12-31 23:57:512021-01-01 00:04:56N1.0168751.01.998.00.50.50.000.0None0.39.302.01.00.007.083333
4722021-01-01 00:26:312021-01-01 00:28:50N1.075756.00.453.50.50.50.960.0None0.35.761.01.00.002.316667
5922021-01-01 00:58:322021-01-01 01:32:34N1.02252651.012.1938.00.50.52.750.0None0.342.051.01.00.0034.033333
61022021-01-01 00:31:142021-01-01 00:55:07N1.02442442.03.3918.00.50.50.000.0None0.319.302.01.00.0023.883333
71122021-01-01 00:08:502021-01-01 00:21:56N1.0752131.06.6919.50.50.50.000.0None0.320.802.01.00.0013.100000
81222021-01-01 00:35:132021-01-01 00:44:44N1.0742381.02.3410.00.50.50.000.0None0.314.051.01.02.759.516667
91322021-01-01 00:39:572021-01-01 00:55:25N1.074601.05.4818.00.50.50.000.0None0.319.302.01.00.0015.466667

Last rows

df_indexVendorIDlpep_pickup_datetimelpep_dropoff_datetimestore_and_fwd_flagRatecodeIDPULocationIDDOLocationIDpassenger_counttrip_distancefare_amountextramta_taxtip_amounttolls_amountehail_feeimprovement_surchargetotal_amountpayment_typetrip_typecongestion_surchargeduration
738987650822021-01-31 20:17:002021-01-31 20:35:00NoneNaN108210NaN5.0525.952.750.00.00.00None0.329.00NaNNaNNaN18.0
738997650922021-01-31 20:23:002021-01-31 20:41:00NoneNaN60254NaN5.3329.452.750.00.00.00None0.332.50NaNNaNNaN18.0
739007651022021-01-31 21:09:002021-01-31 21:23:00NoneNaN174213NaN5.1829.452.750.00.00.00None0.332.50NaNNaNNaN14.0
739017651122021-01-31 21:33:002021-01-31 22:18:00NoneNaN136225NaN17.1371.832.750.00.06.12None0.381.00NaNNaNNaN45.0
739027651222021-01-31 21:58:002021-01-31 22:47:00NoneNaN21841NaN18.1856.862.750.00.06.12None0.366.03NaNNaNNaN49.0
739037651322021-01-31 21:38:002021-01-31 22:16:00NoneNaN8190NaN17.6356.232.750.00.06.12None0.365.40NaNNaNNaN38.0
739047651422021-01-31 22:43:002021-01-31 23:21:00NoneNaN35213NaN18.3646.660.000.012.26.12None0.365.28NaNNaNNaN38.0
739057651522021-01-31 22:16:002021-01-31 22:27:00NoneNaN7469NaN2.5018.952.750.00.00.00None0.322.00NaNNaNNaN11.0
739067651622021-01-31 23:10:002021-01-31 23:37:00NoneNaN168215NaN14.4848.872.750.00.06.12None0.358.04NaNNaNNaN27.0
739077651722021-01-31 23:25:002021-01-31 23:35:00NoneNaN119244NaN1.8115.452.750.00.00.00None0.318.50NaNNaNNaN10.0